Machine Code Programming
CPU only sees binary in memory
Instead of writing machine code in binary, assembly is used for ease
- Low level
- Opcodes represented by mnemonics
- Registers have names
- Memory addresses specified with labels
An assembler is used to turn assembly into machine code
- Not the same as compilation of high level code
- Each assembly line directly translates to one machine code instruction
A label can refer to the address of an instruction, or the address of a data item
The mnemonic and its operands are directly translated into machine code
adjust: mov eax, num1 ; put number into register
- Here,
adjustis a label (pointing to an instruction address) mov eax, num1is the mnemonic for the instruction followed by two operands (eax is a register)- semicolons are used for comments
- instruction labels and comments are optional, so this line can be written basically as
mov eax, num1
To run this code in C++:
#include <stdio.h>
#include <stdlib.h>
int main (void) {
int num = 10; // you can declare variables outside the assembly block
// assembly block:
_asm {
mov eax, num
add eax, 12
mov num, eax
}
return 0;
}
Intel x86 Registers
- Lots of registers, but only some needed for these purposes
- IP and IR are registers that have been mentioned
- Four main general purpose registers:
- EAX - accumulator
- EBX - base register
- ECX - counter register
- EDX - data register
- These have designated meanings, however can be used for whatever purpose
- EAX usually used for calculations
- ECX usually used for keeping track of loop iterations
The Accumulator
- RAX - 64 bits of the accumulator
- EAX - lowest 32 bits of the accumulator
- AX - lowest 16 bits of the accumulator
- AH - Upper 8 bits of the AX
- AL - Lower 8 bits of the AX
Example code:
- Put 42 into accumulator:
mov eax, 42
- Move lowest 16 bits of a variable into accumulator (count is variable label)
mov ax, count
- Move ascii value of 'x' into lowest byte of accumulator
mov al, 'x'
- Increment accumulator
inc eax
- Add 10 to accumulator
add eax, 10
Note: The first operand is the destination, the second operand is the source
For move operations, a register must be involved. You cannot move data directly from memory to memory. Also, for move operations the source operand is not changed or erased.
Basic Maths
For a basic high level instruction like
int num = count1 + count2 - 10;
In assembly, the accumulator stores the result of each step (accumulating the answer)
mov eax, count1
add eax, count2
sub eax, 10
mov num, eax
Addition and subtraction work as expected, for multiplication, only one operand is used which is the value to multiply the accumulator by, for example, to calculate 10*12
mov eax, 10
mov ebx, 12
mul ebx
^ This would result in 120 being stored in EAX
Division
- Some things need to be set up first
- Dividend formed from EDX (high 32bits) and EAX (low 32bits)
- Divisor stored in another register
- This performs integer division, so could be remainder
- Result stored in EAX and remainder stored in EDX
For 120/9
mov ebx, 9
mov edx, 0
mov eax, 120
div ebx
Operation will set status flags if the result is too big or division by zero is attempted
Status flags
Important flags:
- CF - carry flag - previous operation had a carry from the most significant bit
- ZF - zero flag - previous operation had a zero result
- SF - sign flag - previous operation was positive (0) or negative (1)
- OF - overflow flag - previous operation result was too big to fit in memory
We can use jump instructions to check flags and take appropriate action
Unconditional Jump
An unconditional jump will move the IP to the given address label
mov eax, 10
begin: add eax, 10
jmp begin
The above code is an infinite loop that keeps adding 10 to EAX
Eventually, EAX would get too big and overflow
Jumping is unrestricted, so should take care to avoid messy code with jumps all over the place
Conditional Jumps
A conditional jump happens if a certain condition is true
If the condition is false, the IP moves to the next instruction
Jump instructions:
- jc - jump if carry flag
- jnc - jump if no carry flag
- jz - jump is zero flag
- jnz - jump if no zero flag
- js - jump if sign flag
- jns - jump if no sign flag
- jo - jump if overflow flag
- jno - jump if no overflow flag
eg.
num = num - 10;
if (num==0) {
num = 100;
}
this code in assembly would look like:
mov eax, num
sub eax, 10
jnz store
mov eax, 100
store: mov num, eax
Comparing Values
- The cmp instruction compares two values
- Internally, it subtracts one from the other without changing either operand
- If both values are the same, the zero flag is set
cmp eax, ebx
By placing this before a jump instruction: - je - jump if operands are equal
- jne - jump if operands are not equal
- jg/jnle - jump if the first operand is greater
- jle/jng - jump if the first operand is less than or equal
- jl/jnge - jump if first operand is less than
- jge/jnl - jump if first operand is greater than or equal
These only work as expected if they immediately follow a compare instruction
If-Else in Assembly
if (num>0){
pos = pos+num;
} else {
neg = neg+num;
}
would be
mov eax, num
cmp eax, 0
jg postv
negtv: add neg, eax
jmp endif
postv: add pos, eax
endif: ..
..
Loops
- Can loop over instructions by jumping backwards
- ECX can be used in conjugation with the loop instruction
- Load the amount of iterations into ECX (eg.10)
- At the end of the code to iterate over, add
loop [label]where label is the the label of the first line of code of the loop, and it will first decrement ECX by one and then it will jump if the ECX is not zero
Labels & Memory Addresses
- A label just points to a memory address
- In the C++ code, we declare a variable name and optionally give it a value
int age = 21;
- In the assembly code, we can use the label to refer to the variable in memory
mov eax, age
- But if you want the memory address of the variable, not its value, you use
lea(Load Effective Address)lea ebx, age
- And if we have a memory address stored in a register, we can use register indirect mode to get the value stored in that location
mov eax, [ebx]
Arrays
- Arrays are just items stored in consecutive memory locations
- The amount of memory depends on the data being stored in the array
- In a 32-bit system (eg. Intel x86), each integer takes up 4 bytes of memory
int grades[4] = {64, 78, 60, 55};
- We first get the memory address of the array:
lea ebx, grades
- To get the value stored in the second array item, we add 4 to the address
add ebx, 4mov eax, [ebx]
Array Processing
- With this, we can loop through an array and sum its contents
- In the C++ part of the code, we define an array with 4 items:
int grades[4] = {64, 78, 60, 55};
- In the assembly code we set up the loop
lea ebx, gradesload memory location of array into ebxmov ecx, 4set loop counter to 4mov eax, 0set eax to 0floop: add eax, [ebx]add value in current memory location in array to eaxadd ebx, 4go to next memory location in arrayloop flooploop with ecx
- This uses 3 registers
- EAX - stores the sum as we go along
- EBX - stores the memory address of the current item in the array
- ECX - loop counter
Subroutines
Parameter passing is tricky in assembly
Subroutine calls just change the IP (but getting back where you were before is tricky)
No local registers in subroutines
No fancy way to specify parameters and their types
Use the
callinstruction with the label of the first line of the subroutineUse the
retinstruction to return from the subroutineIf you don't return, the IP will just go to the next instruction in memory
When a subroutine is called, the IP is changed to it's address
- Fetch-execute cycle continues with instructions from that point onward
- The ret instruction changes the IP back to the address following the original
callinstruction - So the CPU must remember where to return to